3 research outputs found

    Cache-Oblivious Iterated Predecessor Queries via Range Coalescing

    Get PDF
    In this paper we develop an optimal cache-oblivious data structure that solves the iterated predecessor problem. Given k static sorted lists L[subscript 1],L[subscript 2],ā€¦,L[subscript k] of average length n and a query value q, the iterated predecessor problem is to find the largest element in each list which is less than q. Our solution to this problem, called ā€œrange coalescingā€, requires O(log[subscript B+1]n+k/B) memory transfers for a query on a cache of block size B, which is information-theoretically optimal. The range-coalescing data structure consumes O(kn) space, and preprocessing requires only O(kn / B) memory transfers with high probability, given a tall cache of size M=Ī©(B[superscript 2])

    Parallel algorithms for scheduling data-graph computations

    No full text
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 169-182).A data-graph computation - popularized by such programming systems as Pregel, GraphLab, Galois, Ligra, PowerGraph, and GraphChi - is an algorithm that iteratively performs local updates on the vertices of a graph. During each round of a data-graph computation, a user-supplied update function atomically modifies the data associated with a vertex as a function of the vertex's prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next round. In this thesis, I explore two ways of scheduling deterministic parallel data-graph computations that provide performance guarantees culminating in theoretical contributions to graph theory and practical, high-performance systems. In particular, I describe a system called Prism which processes dynamic and static data-graph computations on arbitrary graphs using a technique called chromatic scheduling. Using a vertex-coloring to identify independent sets of vertices, which may be safely processed in parallel, Prism serializes through the colors and processes the independent sets in parallel, thus executing data-graph computations deterministically and without the use of costly atomic instructions (e.g., Compare-And-Swap). Prism supports dynamic data-graph computations deterministically and work-efficiently through the introduction of multibag and multivector data structures. Prism requires a vertex-coloring, and since graphs are generally not supplied with one, it is necessary to find one as a preprocessing step. Furthermore, the runtime of Prism is linear in the number of colors and thus motivates a study in this thesis of fast parallel coloring algorithms that provide vertex-colorings with few colors in practice. At the core of the analysis of these coloring algorithms lies a new result about the maximum depth of a random priority dag, the dag that results from randomly ordering vertices and directing edges from lower to higher numbered vertices in the order. In particular, when the largest degree [delta] in the graph G = (V,E) is less than ln !V !, I show a tight bound on the longest path: [theta](ln V / ln (e ln V / [delta])) with high probability. When [delta] is greater than ln !V!, the longest path in the dag is simply [theta] (min {[delta], [square root sign]E}) , also with high probability. I also present a system called Laika which processes data-graph computations for the special, but important, case of graphs representing physical simulations. Such graphs typically have vertices with coordinates in 3D space and are connected to other "nearby" vertices. We take advantage of these two properties to execute physical simulations, cast as data-graph computations, that make efficient use of cache resources. I analyze a contrived graph construction - a random cube graph - as a proxy for the mesh graphs that arise in physical simulations: n vertices are uniformly randomly assigned positions in the unit cube and have edges connecting them to any other vertices that are within a distance r = O (V -Ā¹/Ā³) . For such a graph and given a cache sufficiently large to hold M vertices, I improve on previous theory to show that a fraction O(M-Ā¹/Ā³) of edges will connect to vertices not in the cache, whereas previous theory held that this "miss rate" is O(M-Ā¹/ā“). Laika also guarantees linear speedup for any random cube graph G = (V,E) with constant average degree for any number of workers P = O (V= lgĀ² V).by William Cleaburn Hasenplaugh.Ph. D

    Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling

    No full text
    A data-graph computationā€”popularized by such programming systems as Galois, Pregel, GraphLab, PowerGraph, and GraphChiā€”is an algorithm that performs local updates on the vertices of a graph. During each round of a data-graph computation, an update function atomically modifies the data associated with a vertex as a function of the vertexā€™s prior data and that of adjacent vertices. A dynamic data-graph computation updates only an active subset of the vertices during a round, and those updates determine the set of active vertices for the next round. This article introduces Prism, a chromatic-scheduling algorithm for executing dynamic data-graph computations. Prism uses a vertex coloring of the graph to coordinate updates performed in a round, precluding the need for mutual-exclusion locks or other nondeterministic data synchronization. A multibag data structure is used by Prism to maintain a dynamic set of active vertices as an unordered set partitioned by color. We analyze Prism using work-span analysis. Let G = (V, E) be a degree-Ī” graph colored with Ļ‡ colors, and suppose that QāŠ†V is the set of active vertices in a round. Define size(Q)= |Q| + āˆ‘vāˆˆ Q deg(v), which is proportional to the space required to store the vertices of Q using a sparse-graph layout. We show that a P-processor execution of Prism performs updates in Q using O(Ļ‡ (lg ( Q/Ļ‡ ) + lg Ī” ) + lg P span and Ī˜(size(Q) + P) work. These theoretical guarantees are matched by good empirical performance. To isolate the effect of the scheduling algorithm on performance, we modified GraphLab to incorporate Prism and studied seven application benchmarks on a 12-core multicore machine. Prism executes the benchmarks 1.2 to 2.1 times faster than GraphLabā€™s nondeterministic lock-based scheduler while providing deterministic behavior. This article also presents Prism-R, a variation of Prism that executes dynamic data-graph computations deterministically even when updates modify global variables with associative operations. Prism-R satisfies the same theoretical bounds as Prism, but its implementation is more involved, incorporating a multivector data structure to maintain a deterministically ordered set of vertices partitioned by color. Despite its additional complexity, Prism-R is only marginally slower than Prism. On the seven application benchmarks studied, Prism-R incurs a 7% geometric mean overhead relative to Prism
    corecore